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MODELING  THE  TEMPORAL  RELATIONSHIP  OF  CASUALTY 
REPORTS  TO  THE  OPERATIONAL  PROPULSION  PLANT  EXAM 
Robert  R.  Read  and  Lyn  R.  Whitaker* 

Abstract 

This  report  applies  modern  categorical  data  analysis  to  the  problem  of 
describing  the  probability  laws  of  casualty  reports  of  United  States  ships  of 
the  line  in  relation  to  the  type  of  casualty  and  temporal  nearness  of  the 
Operational  Propulsion  Plant  Exam.  It  sets  an  example  as  to  how  data  of  this 
type  are  analyzed,  to  treat  questions  relating  to  competing  modes  of  analysis, 
and  to  provide  direction  in  the  use  of  currently  available  software. 


1.  INTRODUCTION 

This  report  applies  modern  categorical  data  analysis  to  the  problem  of 
describing  the  probability  laws  of  the  casualty  reports  (CASREPTs)  of  United 
States  ships  of  the  line  in  relation  to  the  type  of  casualty  (engineering  or 
nonengineering)  and  the  temporal  nearness  of  the  Operational  Propulsion 
Plant  Exam  (OPPE).  It  is  postulated  that  the  preparation  for  the  OPPE  drains 
resources  from  normal  maintenance  operations  in  a  way  that  induces  an 
increase  in  the  number  of  casualty  reports  as  the  time  of  the  exam  approaches. 
Also,  the  number  of  casualty  reports  diminishes  monotonically  with  time  in 


*  This  research  was  supported  by  course  development  funds  of  the  Naval 
Postgraduate  School,  Monterey,  California. 
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the  post  exam  period  as  the  system  recovers  from  the  effect.  Further  the  effect 
may  be  different  for  engineering  and  nonengineering  casualty  reports. 

The  Navy  created  the  Propulsion  Examining  Board  (PEB)  in  1972.  It  was 
tasked  with  inspecting  the  propulsion  plants  of  the  Navy's  surface  ships.  The 
OPPE  exam  is  first  conducted  approximately  fifteen  months  after  a  ship  has 
completed  a  regular  overhaul,  and  is  repeated  about  every  fifteen  months 
thereafter  until  the  ship  again  enters  the  overhaul  state.  The  PEB  has  the 
authority  to  "tie  up"  a  ship  which,  in  its  opinion,  has  an  engineering  plant 
that  is  not  safe  to  operate  or  does  not  have  enough  qualified  engineering 
watch  standers  to  operate  it  properly.  Each  fleet,  Atlantic  and  Pacific,  controls 
its  own  PEB  and  there  may  be  differences  in  policies  that  affect  the  results.^ 

This  study  is  restricted  to  frigates,  destroyers  and  cruisers  in  each  fleet 
possessing  a  1200  PSI  steam  engineering  plant.  The  time  period  is  January 
1974  to  July  1978,  and  only  those  CASREPTs  with  a  C-3  or  C-4  readiness  code 
are  considered.  The  data  are  extracted  from  Tables  18  and  19  of  the  master's 
thesis  of  F.  J.  Klingseis  (1979),  who  obtained  them  from  CNA.  The  thesis 
mentions  other  data  caveats  as  well.  There  are  some  discrepancies  of  these 
data  from  those  of  his  Table  16.  There  is  no  way  to  resolve  the  discrepancies, 
since  the  data  are  old.  Nonetheless  we  pursue  the  development.  Our  goal  is 
to  set  an  example  as  to  how  data  of  this  type  are  to  be  analyzed,  to  treat 
questions  relating  to  competing  modes  of  analysis,  and  to  provide  direction 


^Beginning  early  in  1992,  the  inspections  for  the  two  fleets  will  be  made 
identical. 
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in  the  use  of  currently  available  softv^are.  Implementation  of  the  methods  for 
current  use  is  left  to  others. 

The  raw  data  appear  in  Table  1.  It  may  be  viewed  as  being  six 
dimeitsional  with 


representing  the  frequency  count  of  exactly  (r-1)  =  0,  1,  2,  3  or  more 
CASREPTs  in  months  (s  =  1,  6)  measured  before  (i  =  1)  or  after  (z  =  2)  the 

date  of  the  OPPE;  having  been  typed  as  engineering  {k  =  1)  or  nonengineering 
{k  =  2);  for  ships  of  the  class  frigates,  destroyers,  or  cruisers  (/  =  1,  2,  3  resp.); 
and  belonging  to  the  Atlantic  (j  =  1)  or  Pacific  (j  =  2)  fleets.  Thus  there  are  576 
cells  of  counts.  A  visual  inspection  of  Table  1  is  not  very  revealing.  The 
number  of  ships  by  fleet  in  each  class  is  treated  as  fixed  by  design. 
Specifically 


^ijkj+s  ~  ^jl 


(2) 


and  these  values  appear  in  Table  2.  It  is  important  to  note  that  we  do  not  have 
information  by  individual  ships,  only  the  totals  for  ship  class  by  fleet.  This 
kind  of  collapsing  is  a  bit  unsettling  as  much  detail  is  lost. 

The  first  round  of  analyses  are  the  elementary  and  naive  ones.  These  treat 
the  cells  as  24  (before /after  by  two  fleets  by  two  casualty  types  by  three  ship 
classes)  separate  4  by  6  (frequency  categories  by  months)  contingency  tables. 
The  basic  c/zz-square  test  for  common  distribution  over  the  months  can  be 
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TABLE  1.  TWENTY-FOUR  4x6  FREQUENCY  TABLES  OF  CASREPTS 


ATLANTIC 


PACIFIC 


Engineerin 


_ BBEEinaSZnSjaH 


FRIGATES— BEFORE 


Engineerin 


rawtiMBiiiKiiMBaKFignraiagi 


FRIGATES— AFTER 


Nonengmeerin 


Eiin] 


DESTROYERS— BEFORE 


Nonengineerin 


DESTROYERS— AFTER 


Nonengineerin 


ll3gFlKaKilWiK.-iBUlKnBFIWllfawa 


mm. 


CRUISERS— BEFORE 


CRUISERS— AFTER 


Engineering 


Nonengineering 


accepted.  It  is  also  true  that  24  separate  loglinear  models  which  treat 
themonths  as  ordinal  data  provide  equally  acceptable  fits  to  the  data. 
Moreover,  the  latter  model  exhibits  the  monotone  change  in  frequency  of 
casualties  as  originally  hypothesized.  These  studies  are  contained  and 
discussed  further  in  Section  2  following  this  introduction.  The  main  body  of 
the  report  appears  in  Section  3  where  a  modem  loglinear  model  is  selected  to 
describe  the  entire  six  dimensional  data  set.  It  is  shown  there  that  any 
reasonable  loglinear  model  must  include  the  casualty  count  by  month 
interaction  term. 

Section  4  contains  another  model  building  effort  based  upon  a 
specialized  collapsing  of  the  original  data  set.  It  provides  some  rather 
interesting  contrasts.  It  is  used  largely  for  a  logit  analysis  of  the  engineering 
versus  nonengineering  CASREPTs.  The  results  are  summarized  in  Section  5. 
An  annotated  SAS  code  for  the  developments  in  Section  3  is  presented  in 
Appendix  A.  The  details  of  fitting  censored  Poisson  and  Geometric 
distributions  to  the  24  separate  frequency  tables  appear  in  Appendix  B. 

TABLE  2.  NUMBER  OF  SHIPS  BY  CLASS  AND  FLEET  (xV.,) 


FF 

DD 

CG 

Atlantic 

44 

29 

8 

Pacific 

40 

18 

10 

2.  ELEMENTARY  ANALYSES:  TWENTY-FOUR  SEPARATE  CASES 
The  standard  procedure  for  testing  whether  the  six  months  have  a 
common  four  point  distribution  can  be  found  in  any  basic  statistics  text,  e.g., 
Agresti  (1990),  and  the  test  statistics  have  an  asymptotic  c/zz-squared 
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distribution  with  15  degrees  of  freedom  when  the  null  hypotheses  are  true. 
Assuming  independence  of  the  24  data  sets,  then  if  all  null  hypotheses  are 
valid,  the  p-values  of  the  tests  form  a  random  sample  from  a  Uniform  [0,1] 
distribution.  This  is  a  consequence  of  the  probability  integral  transformation. 
Thus,  a  test  for  the  simultaneous  validity  of  all  24  null  hypotheses  may  be 
executed  using  a  Kolmogorov-Smimov  test  for  uniformity  of  the  distribution 
of  the  p -values.  Here,  p  stands  for  the  empirical  significance  level  (the 
probability  of  a  result  at  least  as  extreme  if  Hq  were  true). 

The  24  test  statistics  appear  in  Table  3  below  and  their  significance 
numbers,  in  the  form  of  1-p,  follow  in  Table  4.  They  too  would  be  uniformly 
distributed  if  all  null  hypotheses  were  true.  They  appear  to  be  smeared 
evenly  over  the  unit  interval.  The  Kolmogorov-Smirnov  statistic  is,  for  {p^} 
equal  to  the  ordered  values  of  p, 

=  max|  pp;/n  I  =  .183  and  Pr(Vn  ^ -183)  =  .401 


and  there  is  temptation  to  stop  the  analysis  here. 


TABLES.  CHI-SQUARE  VALUES  FOR  CASUALTY  COUNTS 
INDEPENDENT  OF  MONTH 


ATLANTIC  PACIFIC 


Nonengineering 

FF 

b 

28.84 

20.17 

22.75 

17.60 

a 

25.87 

11.26 

15.67 

18.23 

DD 

b 

19.12 

11.81 

21.03 

12.48 

a 

22.36 

12.87 

19.58 

15.46 

CG 

b 

12.67 

14.37 

15.57 

16.47 

a 

13.36 

6.91 

16.45 

11.45 

6 


At  this  level  one  should  realize  that  failing  to  reject  a  particular  model 
does  not  preclude  the  acceptability  of  a  competing  model.  Indeed,  the  power 
of  the  chi-square  goodness-of-fit  procedure  is  not  great.  Accordingly  we  try 
our  luck  with  loglinear  models  that  allow  for  variability  of  the  casualty 
frequency  distribution  by  month.  Moreover  month  is  to  be  treated  as  a  scored 
ordinal  variable.  If  an  acceptable  fit  is  achieved  then  we  look  for  monotoniety 
of  change  by  month. 


TABLE  4.  SIGNIFICANCE  {l-p  VALUES)  OF  TABLE  3  STATISTICS 
ATLANTIC  PACIFIC 


Engineering 

FF 

b 

.983 

.835 

.910 

.716 

a 

.961 

.266 

.595 

.749 

DD 

b 

.792 

.307 

.864 

.358 

a 

.901 

.388 

.811 

.581 

CG 

b 

.372 

.502 

.589 

.648 

a 

.425 

.040 

.647 

.280 

It  IS  interesting  to  note  that  the  total  number  of  ships  constraint,  see  Table 
2,  has  a  profound  effect  upon  the  choice  of  model  to  be  fitted.  We  btgin  with 
the  simplest. 

Since  we  are  treating  the  24  tables  separately,  we  drop  all  subscripts 
except  r  and  s  for  the  time  being.  Let  be  the  expected  cell  frequency; 
adopt  the  simple  ordinal  scoring  =  s  for  s  =  1,  ...,  6  and  0  =  3.5.  Consider 
the  loglinear  model 


log(m  J  +  5/Uj  -  u) 


(3) 
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4. 


withLS,  =  0,  and  r  ranges  1, 

The  total  number  of  ships  constraint  requires  that  all 

=  N  for  s  =  6  (4) 

where  N  is  the  appropriate  number  from  Table  2.  In  terms  of  our  model  this 
requires 


(5) 

r 

which  can  happen  only  if  all  5^  =  0.  This  in  turn  confiscates  all  usefulness  of 
the  model.  The  same  analysis  leads  to  the  rejection  of  the  model 

log(m„)  +  a^  +  5{v-  u).  (6) 

The  simplest  feasible  model  with  months  taken  to  be  ordinal  is 

log(m^j)  =  n  +  a,  +  +  5^{v-  U)  (7) 

with  L  =  L  5^  =0.  The  cells  means  are  estimated  by  iterated 

proportional  scaling. ^  The  24  c/ii-square  goodness-of-fit  test  statistics  appear 
in  Table  5  (12  degrees  of  freedom)  and  their  significance  values  in  Table  6. 


^Computational  support  is  discussed  in  Sections  3  and  4.  Note  that  PROC 
CATMOD  of  SAS  Version  6.06  does  not  have  a  command  to  fit  loglinear 
models  with  ordinal  e.xplanatory  variables.  However,  such  a  model  can  be  fit 
using  PROC  CATMOD  by  specifying  the  appropriate  design  matrix. 
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Again  the  Kolmogorov-Smirnov  procedure  is  performed  for  the  24 
p-values,  producing 

=  max|pj-//ti|  =.173  and  Pr[V”  ^ -1731  =  .472.  (8) 

Thus  both  models  fit  the  data  equally  well. 

Let  us  examine  our  estimates  of  the  probability  to  see  if  the  number 
casualties  estimated  by  the  model  decrease  with  time.  Table  7  contains  a 
compilation  of  the  probability  of  zero  casualty  reports  for  each  month  for 
each  of  our  24  cases.  In  22  of  the  cases  the  probabilities  grow  monotonically 
by  month,  thus  supporting  our  assertion.  It  also  seems  that  the  probabilities 
for  engineering  casualties  change  more  than  those  for  nonengineering.  It  may 
be  curious  to  note  that  in  the  odd  cases,  engineering  casualties  after  OPPE  for 
Pacific  cruisers  and  destroyers,  that  the  probabilities  are  strictly  decreasing 
with  month.  On  the  other  hand,  the  probabilities  of  casualties  (average 
engineering  and  nonengineering),  follow  the  asserted  monotone  increasing 
pattern.  It  is  instructive  for  the  reader  to  compare  the  cumulative 
distribution  functions  that  result  from  fitting  (7). 


TABLE  5.  GOODNESS-OF-FIT  VALUES  FOR  THE  LOGLINEAR  MODEL 

ATLANTIC  PACIFIC 


FF 

a 

10.68 

8.88 

9.22 

11.44 

a 

14.74 

8.88 

13.16 

12.16 

DD 

b 

10.55 

8.55 

11.61 

14.00 

a 

6.82 

5.92 

15.87 

9.06 

CG 

b 

12.40 

13.17 

12.62 

18.83 

a 

8.38 

13.12 

14.71 

10.68 
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TABLE  6.  SIGNIFICANCE  (l-p  VALUES)  OF  TABLE  5  STATISTICS 


ATLANTIC  PACIFIC 


□ 

FF 

il 

.443 

.287 

.316 

.508 

a 

.744 

.287 

.642 

.575 

DD 

b 

.432 

.259 

.522 

.699 

a 

.130 

.080 

.803 

.302 

CG 

a 

.582 

.643 

.603 

.907 

□ 

.246 

.639 

.742 

.443 

TABLE  7.  PROBABILITY  OF  ZERO  CASREPTS  LOGLINEAR  MODEL 


Engineering  Nonengineering 


FF 

b 

Atlantic 

.405 

.487 

.568 

.645 

.713 

.772 

.377 

.422 

.467 

.511 

.553 

.594 

Pacific 

.418 

.472 

.525 

.575 

.621 

.664 

.459 

.495 

.532 

.566 

.597 

.626 

a 

Atlantic 

.536 

.576 

.613 

.647 

.676 

.701 

.479 

.498 

.516 

.532 

.548 

.563 

Pacific 

.451 

.479 

.507 

.535 

.563 

.590 

.438 

.468 

.497 

.524 

.550 

.574 

DD 

b 

Atlantic 

.448 

.512 

.572 

.627 

.676 

.718 

.337 

.364 

.390 

.416 

.441 

.465 

Pacific 

.381 

.476 

.563 

.632 

.681 

.711 

.517 

.539 

.560 

.577 

.592 

.604 

a 

Atlantic 

.604 

.662 

.712 

.755 

.791 

.821 

.374 

.414 

.454 

.494 

.533 

.570 

Pacific 

.534 

.532 

.520 

.500 

.473 

.441 

.378 

.416 

.452 

.483 

.512 

.537 

CG 

b 

Atlantic 

.615 

.695 

.751 

.789 

.815 

.835 

.398 

.461 

.511 

.542 

.551 

.538 

Pacific 

.304 

.353 

.400 

.444 

.483 

.517 

.458 

.524 

.585 

.638 

.681 

.714 

a 

Atlantic 

.485 

.606 

.659 

.707 

.751 

.790 

.262 

.290 

.319 

.238 

.376 

.404 

Pacific 

.812 

.786 

.756 

.■/22 

.684 

.640 

.315 

.365 

.414 

.460 

.503 

.543 

To  conclude  this  section  we  note  that,  with  a  quick  look  at  the  p-values, 
both  models  are  defensible  and  the  one  that  models  the  casualty  distribution 
as  a  function  of  time  clearly  supports  our  conjecture.  We  also  know  the 
power  of  the  statistical  procedure  is  not  high.  More  importantly,  the  24  cases 
are  not  independent.  The  cross  classifications  of  before/after  and 
engineering/  nonengineering  refer  to  the  same  ships.  There  are  but  six  (fleet 
by  ship  class)  independent  data  sets. 
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3.  A  LOGLINEAR  MODEL 

The  analysis  in  the  previous  section  gives  two  separate  acceptable 
models,  one  indicating  that  temporal  nearness  to  the  OPPE  exam  has  no  effect 
on  the  number  of  CASREPTs  and  a  contradictory  model  indicating  that 
temporal  nearness  does  indeed  have  an  effect.  A  closer  look  at  the  1-p-values 
from  Tables  4  and  6  help  clear  up  this  discrepancy  and  motivate  the  need  to 
consider  the  data  as  a  whole.  If  the  models  fit,  i.e.  the  p-values  (or 
equivalently  the  1-p-values)  form  simple  random  samples  from  a  Uniform 
[0,1]  distribution,  then  subsets  of  the  p-values  should  also  behave  as  simple 
random  samples  from  a  Uniform  [0,1]  distribution.  Figures  1  and  2  give  box- 
plots  of  the  1-p-values  from  Tables  6  and  8  respectively  by  ship  type  and  by 
casualty  type.  In  Figure  1,  there  is  clearly  some  effect  that  the  first  set  of 
models  is  not  picking  up.  This  effect  (Figure  2)  is  given  considerable  relief 
when  temporal  nearness  is  included  in  the  models.  We  note  that  the 
Kolmogorov-Smirnov  procedure  does  not  have  power  to  detect  all  types  of 
departures  from  the  null  hypothesis.  In  particular  it  cannot  detect  patterns 
such  as  those  exhibited  in  Figure  1.  (The  independence  assumption  is 
important.)  From  these  figures  we  can  conclude  that  temporal  nearness  is 
indeed  a  variable  that  needs  to  be  considered,  and  that  there  is  interaction 
between  temporal  nearness  and  the  other  variables. 

In  this  section  we  treat  the  data  as  a  whole,  using  a  loglinear  model  in 
order  to  get  a  better  idea  of  the  interaction  between  temporal  nearness  and  the 
other  variables  and  their  effect  on  the  number  of  CASREPTs.  The  first  step  is 
to  choose  the  main  effects  and  interaction  terms  for  inclusion.  There  are 
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Figure  1.  Box  Plots  of  1-p-values  front  Table  5  by  Ship  Type  and  by 

Casualty  Type 


several  strategies,  similar  to  model  selection  in  regression  settings,  for  doing 
this  (e.g.,  Agresti  (1990)).  Our  strategy  is  motivated  by  the  available  software 
as  well  as  by  certain  aspects  of  the  problem.  Thus,  an  important  feature  of  this 
section  is  the  computational  difficulties  and  methods  for  solving  them. 


Figure  2.  Box  Plots  of  1-p-values  from  Table  6  by  Ship  Type  and  by 

Casualty  Type 


The  cell  counts  are  not  realizations  from  a  single  multinomial  distribution 
since  the  number  of  ships  by  fleet  is  fixed,  and  the  number  of  casualties  is 
reported  for  the  same  ships  for  both  casualty  types  over  the  12-month  period 


surrounding  the  OPPE.  They  can  be  modeled  as  realizations  from  several 
multinomial  distributions.  In  particular,  for  each  i,  j,  k,  I,  s  the  random 
variables  of  the  form 


^ijTd+s  ~^u 


(9) 


have  multinomial  distributions  where  is  the  random  variable 

corresponding  to  the  observed  frequency  The  natural  inclination  is  to 

take  the  likelihood  to  be  the  product  of  these  multinomials  and  continue  from 
there.  By  doing  this  we  are  tacitly  assuming  that  the  number  of  casualties  per 
casualty  type  and  month  before  and  after  OPPE  are  independent  within  each 
ship  type  by  fleet  as  well  as  between  ship  types  and  fleets.  The  disturbing 
part  of  this  assumption  is  that  for  each  ship  type  by  fleet,  the  same  ships  are 
observed  over  the  12-month  period  surrounding  the  OPPE.  If  we  had  data  by 
ship,  it  might  be  possible  to  take  into  account  potential  dependence  in  the 
number  of  casualties  within  a  ship  using  a  repeated  measures  design. 
However,  we  don't  have  the  data. 

Some  statistical  packages  such  as  SAS  are  able  to  maximize  products  of 
multinomials,  others  are  not.  Birch  (see  Agresti  (1990:p.  169))  showed  that  the 
MLEs  for  a  multinomial  likelihood  are  the  same  as  the  MLEs  for  product 
multinomials  as  long  as  the  model  contains  a  term  for  the  marginal 
distribution  fixed  by  the  sampling  design.  In  this  problem  the  number  of 
counts  of  ship  by  fleet  by  casualty  type  by  month  by  before  and  after  OPPE  is 
fixed.  Thus  designating  the  design  factors  and  levels  as  follows 


Factor 

No. 

Levels 

Levels 

A 

2 

Before/after  OPPE  i  =  1,  2, 

B 

2 

Atlantic,  Pacific  fleets,  /  =  1,  2 
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C  2  Engineering,  nonengineering  casualty  types,  k  =  1,2 

D  3  Frigates,  Destroyers,  Cruisers,  1=1,2, 3 

E  6  months  measured  from  the  time  of  OPPE  s  =  1,  6 

F  4  0, 1,  2,  3  or  more  CASREPTs,  r  =  1,  2,  3, 4 

we  can  use  a  package  that  does  not  explicitly  maximize  the  product  of 

independent  multinomials  by  including  the  5-way  interaction  term  ABCDE. 

The  goal  is  to  find  a  reasonable  model  that  fits  the  data,  but  does  not 
include  too  many  parameters.  This  is  an  iterative  process  somewhat  similar 
to  stepwise  regression.  We  begin  by  fitting  the  model  with  all  main  effects 
and  ABCDE  (likelihood  ratio  =  663.67  with  562  degrees  of  freedom  and  a 
p-value  =  .0019),  and  the  model  with  main  effects,  all  two-way  interactions 
and  ABCDE  (likelihood  ratio  =  496.52,  with  488  degrees  of  freedom  and  a 
p-value  =  .3850).  The  model  with  all  two-way  interactions  appears  to  fit  the 
data.  Thus,  we  use  this  model  as  a  starting  point  and  then  eliminate 
parameters  sequentially  until  we  get  a  model  that  is  no  longer  suitable. 
Backwards  elimination  is  much  easier  and  safer  than  forward  selection  if  you 
don't  have  a  computer  package  that  does  some  type  of  model  selection. 
Which  terms  to  eliminate  can  be  decided  by  looking  at  the  output  from  one 
run  of  the  more  expansive  model.  Forward  selection  requires  that  a  new 
model  be  fit  for  each  term  that  you  might  want  to  add  to  the  model,  Starting 
from  the  model  with  just  main  effects,  we  would  need  to  make  15  runs  to 
decide  which  of  the  two-way  interaction  terms  produces  the  greatest 
improvement.  SAS  version  6.06  was  used,  even  though  it  does  not  have  a 
stepwise  model  selection  option,  because  it  allows  the  inclusion  of  higher 
order  interaction  terms  such  as  ABCDE,  without  requiring  that  all  lower 
order  terms  be  present. 

We  remove  the  terms  with  the  highest  p-value  for  the  test  of  the  null 
hypothesis  that  the  terms  are  insignificant.  From  the  model  with  all  two-way 
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interaction  terms  (see  Table  8)  we  remove  AB,  AC,  AD,  AE,  BC,  BE,  CD,  CE 
and  DE.  All  have  p-values  >  .8.  Note  that  ABCDE  is  retained  despite  the  fact 
that  it  has  a  p -value  =  1.0000.  Even  though  including  this  term  does  not  affect 
the  estimates  of  the  other  parameters,  or  the  test  statistics,  it  is  needed  to 
provide  the  correct  degrees  of  freedom  for  the  model,  488  versus  498. 
Changing  the  degrees  of  freedom  from  488  to  498  alters  the  p-value  for  the 
model  rather  drastically  from  0.3850  to  0.6.  After  removing  these  terms  we 
have  the  model  output  given  in  Table  9. 

The  overall  likelihood  ratio  test  statistic  changes  slightly  from  496.52  with 
488  degrees  of  freedom  to  497.19  with  520  degrees  of  freedom.  This  difference 
0.67  with  12  degrees  of  freedom  indicates  that  there  is  no  real  difference  in  the 
fits  of  these  two  models.  When  eliminating  more  than  one  term  it  is  important 
to  check  the  difference  in  the  model  fits.  It  could  happen  that  in  the  presence 
of  all  the  other  terms  each  term  by  itself  is  insignificant,  but,  that  taken 
together  with  the  resulting  model  does  not  fit.  This  is  exactly  what  happens 
were  we  to  remove  all  terms,  (except  ABCDE)  with  p-values  >  0.3. 

It  is  clear  from  the  p-values  in  Table  9  that  we  are  close  to  a  final  model, 
thus  we  now  remove  terms  one  at  a  time.  First  A,  then  BF,  then  AF  (see  Tables 
9-11)  to  get  the  model  in  Table  12.  In  Table  10  the  p-value  for  B  in  the  presence 
of  BF  is  0.1195.  However,  once  BF  is  eliminated,  see  Table  11,  the  p-vaiue  for  B 
is  0.0045  indicating  that  both  B  and  BF  are  explaining  the  same  variability  in 
the  cell  frequencies,  and  that  it  would  have  been  a  mistake  to  remove  both  of 
them.  No  further  terms  can  be  eliminated  from  Table  12  with  out  sigruficantly 
changing  the  model  fit.  It  is  interesting  to  note  that  eliminating  factor  A 
(before  and  after)  has  the  effect  of  combining  cells,  i.e.  eliminates  the 
subscipt  i. 
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TABLE  8.  ANALYSIS  OF  VARIANCE  TABLE  FOR  THE  MODEL  WITH 
ALL  MAIN  EFFECTS,  ALL  TWO-WAY  INTERACTION  TERMS,  AND 

THE  ABCDE  TERM 


Source 


A  (before  and  after  OPPE) 


AB 


AC 


AD 


AE 


AF 


B  (Fleet) 


BC 


BD 


BE 


BF 


C  (Casualty  type) 


CD 


CE 


CF 


D  (Ship  type) 


DE 


CF 


E  (Month) 


EF 


F  (CASREPTs) 


ABCDE 


Likelihood  Ratio 


Degrees  of  Freedom  C/tt-square 


0.31  0.5783 


0.00 


0.01  0.9384* 


0.02  0.9911* 


0.00 


5.52  0.1376 


2.35  0.1252 


0.03 


44.94  0.0000 


0.04 


3.32  I  0.3447 


8.71  0.0032 


0.02  0.9881* 


0.54 


37.11  0.0000 


564.98  0.0000 


0.02 


14.99  0.0203 


10.87  0.0541 


60.85  0.0000 


1022.61  0.0000 


0.00  1.0000 


496.52  0.3850 


TABLE  9.  ANALYSIS  OF  VARIANCE  TABLE  FOR  THE  MODEL  OF 
TABLE  8  EXCLUDING  THE  ASTERISKED  TERMS  IN  TABLE  8 


Source 


A  (before  and  after  OPPE) 


AF 


B  (Fleet) 


BD 


BF 


C  (Casualty  type) 


CF 


D  (Ship  type) 


CF 


E  (Month) 


EF 


F  (CASREPTs) 


ABCDE 


Likelihood  Ratio 


Degrees  of  Freedom 


5.49 

0.1391 

2.42 

0.1195 

44.93 

0.0000 

3.25 

0.3546 

11.14 

0.0008 

36.53 

0.0000 

569.00 

0.0000 

14.94 

0.0208 

13.39 

0.0200 

60.28 

0.0000 

1030.18 

0.0000 

0.00 

1.0000 

497.19 

0.7572 

TABLE  10.  ANALYSIS  OF  VARIANCE  TABLE  FOR  THE  MODEL  IN 
TABLE  9  EXCLUDING  THE  AF  TERM 


Source 

Degrees  of  Freedom 

AF 

3 

5.06 

0.1677 

B  (Fleet) 

1 

2.42 

0.1195 

BD 

2 

44.93 

0.0000 

BF 

3 

3.25 

0.3546* 

C  (Casualty  type) 

1 

11.14 

0.0008 

CF 

3 

36.53 

0.0000 

D  (Ship  type) 

2 

569.01 

0.0000 

DF 

6 

14.94 

0.0208 

E  (Month) 

5 

13.38 

0.0200 

EF 

15 

60.28 

0.0000 

F  (CASREPTs) 

3 

1030.29 

0.0000 

ABCDE 

10 

0.00 

1.0000 

Likelihood  Ratio 

521 

497.64 

0.7624 

TABLE  11.  ANALYSIS  OF  VARIANCE  TABLE  FOR  THE  MODEL  IN 
TABLE  10  EXCLUDING  THE  BF  TERM 


Source 

AF 

3 

5.06 

0.1677* 

B  (Fleet) 

1 

8.06 

0.0045 

BD 

2 

44.40 

0.0000 

C  (Casualty  type) 

1 

11.14 

0.0008 

CF 

3 

36.53 

0.0000 

D  (Ship  type) 

2 

568.85 

0.0000 

DF 

6 

14.39 

0.0256 

E  (Month) 

5 

13.39 

0.0200 

EF 

15 

60.28 

0.0000 

F  (CASREPTs) 

3 

1036.71 

0.0000 

ABCDE 

10 

0.00 

1.0000 

Likelihood  Ratio 

524 

500.89 

0.7593 
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TABLE  12.  ANALYSIS  OF  VARIANCE  TABLE  FOR  THE  FINAL  MODEL 


Source 

Degrees  of  Freedom 

Clix-square 

p-value 

B  (Fleet) 

1 

8.06 

0.0045 

BD 

2 

44.40 

0.0000 

C  (Casualty  type) 

1 

11.14 

0.0008 

CF 

3 

36.53 

0.0000 

D  (Ship  type) 

2 

568.85 

0.0000 

DF 

6 

14.38 

0.0256 

E  (Month) 

5 

13.39 

0.0200 

EF 

15 

60.28 

0.0000 

F  (CASREPTs) 

3 

1035.71 

0.0000 

ABCDE 

10 

0.00 

1.0000 

Likelihood  Ratio 

527 

505.96 

0.7377 

The  final  loglinear  model  is 


InpyWrs  =  a  +  ay  +  af  +  af 


with  the  appropriate  constraints  on  the  parameters,  and  where  would 
represent  the  probability  of  an  observation  falling  into  cell  ipclrs  had  we  been 
sampling  from  a  single  multinomial  distribution.  In  our  case,  the  parameters 
of  the  72  (because  factor  A  is  eliminated)  individual  multinomial 
distributions,  i.e.  the  distributions  for  the  number  of  CASREPTs  (0,  1,  2,  3  or 
more)  everything  else  {ijkls)  being  fixed,  are 


Prlijkls  ~ 

J^Pijklrs 

r 


for  r  =  1, ...,  4. 


Since  all  terms  not  involving  F  cancel,  the  estimates  of  these  probabilities  are 
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ex 


Prlijkls 


£  exp{d^ 


+  ri^P  4.aDF 
+  akr  +air 


+  d|f  j' 


and  are  given  in  Table  13,  as  percentages.  This  tells  us  that  given  casualty 
type,  ship  type  and  month  that  fleet  has  no  effect  on  the  distribution  of  the 
number  of  CASREPTs.  Because  the  final  loglinear  model  has  a  BD  interaction 
term,  it  appears  that  differences  in  the  fleets  are  due  to  the  fact  that  the  fleets 
have  a  different  mix  of  shiptypes  (see  Table  2).  For  each  ship  type  by  casualty 
type  by  fleet,  the  estimated  probabilities  of  no  CASREPTs  in  a  given  month 
are  increasing  with  distance  from  the  OPPE  exam.  The  same  cannot  be  said 
for  the  estimated  probabilities  of  three  or  more  CASREPTs;  these  increase 
then  decrease  with  nearness  to  the  OPPE  exam.  But  this  is  due  mostly  to  the 
fact  that  probability  functions  must  sum  to  one.  When  cumulative 
distributions  are  compared,  the  monotonicity  by  month  is  (essentially) 
supported.  Across  all  ship  types  and  months  the  estimated  distribution  for 
Nonengineering  CASREPTs  is  stochastically  greater  than  for  Engineering 
CASREPTs. 

The  difference  in  the  distributions  of  CASREPTs  between  ship  types  is 
not  so  clearcut;  frigates  tend  to  have  the  fewest  CASREPTs  followed  by 
destroyers  then  cruisers.  In  this  model,  either  casualty  type  or  ship  type 
interact  with  the  number  of  CASREPTs  by  month. 

Fitting  this  type  of  loglinear  model  is  not  the  only  way  to  analyze  this 
data.  In  the  next  section  a  substantially  different  approach  is  used  which 
uncovers  structure  in  the  data  not  apparent  from  the  analysis  in  this  section. 
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TABLE  13.  ESTIMATES  OF  THE  DISTRIBUTIONS  OF  THE  NUMBER  OF 
CASREPTS  BY  MONTH,  CASUALTY  TYPE  AND  SHIP  TYPE 


Frigates,  Engineering 


1 

2 

3 

4 

5 

6 

0 

52.06 

53.84 

57.57 

58.72 

64.17 

69.89 

1 

28.63 

26.61 

22.25 

22.79 

20.40 

17.45 

2 

11.42 

11.55 

11.97 

11.16 

9.14 

7.03 

23 

7.88 

7.99 

8.21 

7.32 

6.29 

5.62 

Frigates,  Nonengineering 


1 

2 

3 

4 

5 

6 

0 

42.54 

44.21 

47.77 

49.06 

54.86 

61.18 

1 

32.10 

30.07 

25.41 

26.20 

24.01 

21.02 

2 

14.56 

14.80 

15.49 

14.55 

12.19 

9.60 

23 

10.71 

10.91 

11.53 

10.18 

8.94 

8.20 

Destroyers,  Engineering 


1 

2 

3 

4 

5 

6 

0 

51.66 

53.27 

56.62 

58.03 

63.59 

69.30 

1 

25.22 

23.37 

19.43 

19.99 

17.95 

15.36 

2 

11.87 

11.97 

12.32 

11.55 

9.49 

7.30 

23 

11.26 

11.38 

11.63 

10.42 

8.97 

8.04 

Destroyers,  Nonengineering 


1 

2 

3 

4 

5 

6 

0 

41.79 

43.29 

46.44 

48.00 

53.88 

60.16 

1 

28.08 

26.14 

21.93 

22.75 

20.93 

18.35 

2 

14.98 

15.18 

15.77 

14.91 

12.54 

9.89 

23 

15.15 

15.39 

15.86 

14.34 

12.64 

11.60 

Cruisers,  Engineering 


1 

2 

3 

4 

5 

6 

0 

52.93 

54.64 

58.23 

59.50 

64.90 

70.43 

1 

26.90 

24.96 

20.80 

21.34 

19.07 

16.25 

2 

9.77 

9.87 

10.19 

9.52 

7.78 

5.96 

23 

10.40 

10.53 

10.78 

9.64 

8.25 

7.36 

Cruisers,  Nonengineering 


3 

4 

5 

6 

48.25 

49.69 

55.46 

61,59 

23.72 

24.52 

22.43 

19.56 

13.17 

12.41 

10.38 

8.14 

14.86 

13.39 

11.74 

10.71 

4.  COMPARISON  OF  ENGINEERING  AND  NONENGINEERING 

CASUALTY  REPORTS 

It  is  of  interest  to  study  the  effects  of  the  various  factors  upon  the  ratio  of 
engineering  and  nonengineering  CASREPTs.  The  concern  is  that  resources 
may  be  diverted  from  nonengineering  to  engineering  in  order  to  prepare  for 
the  OPPE.  Also  there  may  be  a  postexam  recovery  effect.  The  particular 
technique  chosen  does  not  utilize  the  model  developed  in  Section  3,  but 
represents  an  alternative  form  of  analysis.  It  is  instructive  to  explore  this 
alternative. 

It  begins  with  an  attempt  to  simplify  the  data  set  by  collapsing  six 
dimensions  to  five.  Specifically,  let 
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I  (''-!)  ^,jklrs 
r=l 


be  the  number  of  CASREPTs  (more  specifically  a  lower  bound  for  the 
number)  recorded  in  before/ after  category  i,  fleet  /,  casualty  type  k,  ship  class 
I  in  month  s;  (i  =  1,  2; ;'  =  1,  2;  =  1,  2;  1  =  1,  2,  3;  s  =  1,  2,  ...,  6). 

These  values  have  the  advantage  of  containing  no  zeroes,  having  five 
dimensions  vice  six,  and  not  possessing  any  restricting  marginal  totals  such  as 
those  of  Table  2.  Thus,  one  might  expect  the  data  in  this  form  to  be  simpler  to 
model.  We  shall  see  however  that  it  is  in  fact  more  difficult  to  model.  The 
reason  for  this  is  that  we  do  not  have  CASREPT  information  for  the 
individual  ships;  we  only  have  data  for  the  cross-classification  of  fleet  by  ship 
class.  In  the  cross-classified  data,  there  are  more  CASREPTs  for  Atlantic 
frigates  than  Atlantic  destroyers  because  there  are  more  frigates  than 
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destroyers  in  the  Adantic,  etc.  Many  of  the  model  effects  estimated  from  this 
data  structure  are  devoted  to  representing  this  information. 

An  additional  reason  for  collapsing  the  data  is  to  gain  experience  in  the 
use  of  a  second  software  system,  specifically  the  categorical  data  analysis 
portion  of  STATGRAPHICS  by  STSC.  This  is  an  interactive  package  that  can 
be  used  on  PCs,  features  stepwise  selection  (both  forward  and  backward) 
modeling,  and  allows  graphical  study  of  the  residuals.  On  the  negative  side, 
this  system  treats  only  hierarchical  models.  If  a  certain  interaction  appears  in 
the  set  of  generators  then  all  main  effects  and  lower  order  interactions  that  can 
be  constructed  from  the  given  generator  must  also  appear  in  the  model.  Thus 
it  is  not  possible  to  include  an  isolated  high  order  interaction  term  for  the 
purpose  of  treating  a  design  constraint,  as  was  done  in  Section  3.  The  factors 
and  levels  are  designated  as  in  Section  3. 

It  is  instructive  to  relate  some  experiences  in  the  artwork  of  modeling: 
The  TEST  ORDER  option  leads  one  to  explore  models  containing  3-way 
effects.  This  done,  the  use  of  BACKWARD  SELECTION  is  exploited  to 
produce  models  that  fit  adequately  and  are  parsimonious  in  terms  of  the 
number  of  effects  included.  This  leads  to  the  consideration  of  the  model 
having  generators 

ABD  ACD  BCE  BDE. 

The  fitting  information  for  this  set  is  _ 


Value 

d.f. 

P 

Likelihood  Ratio  chi-square 

01,0713 

85 

.3056 

Pearson  chi-square 

85.4251 

85 

.4667 
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This  model  fits  the  data  reasonably  well  and  was  chosen  for  further  study  to 
look  for  potential  outliers  and  patterned  residuals.  Use  of  the  STATGRAF 
plotting  options  on  the  standardized  residuals  reveal  two  outliers:  (i)  a  value 
of  -2.33  for  Pacific  cruisers,  nonengineering,  5  months  before  the  OPPE,  and 
(ii)  a  value  of  3.015  for  Atlantic  frigates,  engineering,  6  months  after  the  OPPE. 
An  effort  was  made  to  improve  the  model  by  adding  interaction  terms  even 
though  these  outliers  were  not  especially  severe.  Also,  the  normal  probability 
plot  of  residuals  pointed  to  the  possibility  of  improvement. 

Accordingly,  some  additional  exploration  was  performed  and  it  was 
decided  to  include  the  ACE  interaction  term  in  the  generators.  This  term 
alone  costs  5  degrees  of  freedom  and,  because  of  the  hierarchical  nature  of  the 
algorithm,  an  additional  5  degrees  of  freedom  are  added  for  the  AE 
interaction  that  must  be  included.  Thus  the  finalized  set  of  generators  is 

ABD  ACD  ACE  BCE  BDE 


and  the  fitting  summar\'  is 


Value 

d.f. 

P 

Likelihood  Ratio  cfiz-square 

79.2540 

75 

.3463 

Pearson  c/iz-square 

73.1462 

75 

.5391 

The  full  loglinear  model  is 

+  +Xj+Xj^+X^  +A5 

.AC  ,AD  AE  .BC  ,BD  ,BE 
+  Ajy  +^if,  +^i2  +>lfs  +^;7i 

.ABD  .ACD  ACE  .BCE  .BDE 
^ijl  +  ''■iTis  ^/fts  +  ys 


.CD  .CE 


where  m for  i  =  1,  2; ;  =  1,  2;  fc  =  1,  2;  /  =  1,  2;  s  =  1,  ...,  6,  ar\d  the 
usual  caveats  for  effects  and  interactions  summing  to  zero.  Plots  of 
standardized  residuals  versus  fitted  values  and  Normal  probability  plots 
appear  in  Figure  3. 

This  fitted  model  will  be  used  to  study  the  behavior  of  log  odds  of 
engineering  and  nonengineering  casualties.  The  induced  model  is 


In 


m- 


ijih 


=  H  +  «/  +  +  Yu  +  +  Sj 


which  is  not  too  overbearing.  We  must  include  the  constraints  Pi 

i  ; 

=  L  ^  L  ~  Z  ^-:5  ^  Z  ~  Z  ^  Z  °  values 

i  I  i  s  j  s 

for  the  log  odds  mark  engineering  CASREPS  as  being  favored  (more 
prominent)  and  negative  values  favor  the  nonengineering  type.  Of  course  this 
represents  a  filtering  of  information,  and  the  original  model  cannot  be 
recovered  from  it. 
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cumulative  percent  standardized  residual 


2.7 


Figure  3.  Plots  of  the  Standardized  Residuals  vs.  the  Expected  Counts  and 
the  Normal  Probability  Plots  for  the  Final  Model 
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The  effects  are  readily  iderxtified  as 


]i  -  a-  ZAjj  2Ay2  =  2Ajj^2  ^is  “.^Is  ®;s  ~  ^'^/Is  ‘ 


The  figxires  show  the  six-month  time  traces  of  the  log  odds  for  before  and 
after  crossed  by  the  three  ship  classes;  Figure  4  treats  the  Atlantic  fleet  data 
and  Figure  5  treats  the  Pacific.  For  both  fleets  the  traces  are  generally  parallel 


and  the  post-OPPE  curves  are  below  the  pre-OPPE  curves. 


Figure  4.  Traces  of  the  Log  Odds  versus  Month  from  OPPE  for  the  Atlantic 

Fleet  by  Before/After  and  Ship  Type 


Thus,  the  transfer  of  resources  effect  might  be  associated  with  an 
imbalance  of  CASREPTs  2  to  5  months  after  OPPE  for  the  Atlantic  fleet;  and  1 
to  4  months  after  for  the  Pacific  fleet.  For  both  fleets,  the  curves  for  cruisers 
are  sharply  separated  in  the  before  and  after  effect.  The  curves  for  the 
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destroyers  show  a  bit  less  separation,  and  those  for  frigates  even  less-  In  fact 
the  pre-  and  post-frigate  curves  actually  intersect.  _ 

Before,  FF 


Figure  5.  Traces  of  the  Log  Odds  versus  Month  from  OPPE  for  the  Pacific 
Fleet  by  Before/After  and  Ship  Type 


These  resvdts  are  not  inconsistent  with  those  of  Section  3.  An  estimate  of  a 
lower  bound  for  the  expected  number  of  CASREPTs  can  be  found  from  Table 
13  by 

4 

r  =  1 

for  each  ijTcls.  The  traces  of  the  log  odds  are  given  in  Figure  6.  Because  the 
Before/ After  variable  was  dropped,  pre-OPPE  and  post-OPPE  curves  are  not 
available.  Also,  Adantic  and  Pacific  Fleet  curves  would  be  identical.  Even 


though  the  interactions  between  casualty  type  and  month,  ship  type,  casualty 
type  and  month  did  not  appear  in  the  loglinear  model  of  Section  3,  these 
interactions  are  obvious  from  the  traces  of  the  estimated  expected  number  of 
CASREPTs. 


Month  from  OPPE 


Figure  6.  Log  Odds  vs.  Month  from  OPPE  by  Ship  Type  , 


5.  CONCLUSIONS 

From  the  analysis  in  the  previous  sections  a  few  things  stand  out.  First, 
failing  to  reject  a  particular  model  does  not  mean  that  it  actually  fits  the  data. 
Different  approaches  to  analyzing  the  same  data  can  often  uncover  new 
relationships.  Second,  in  the  analysis  in  sections  3  and  4  we  proceeded  as  if 
there  was  more  data  than  there  actually  was.  In  fact,  157  different  ships  were 
observed,  with  only  8  Atlantic  Fleet  cruisers  and  10  Pacific  Fleet  crviisers. 


Thus,  although  it  is  clear  that  CASREPTs  tend  to  increase  with  proximity  to 
the  OPPE  exam,  that  the  ratio  of  engineering  to  nonengineering  CASREPTs 
tends  to  decrease  with  proxmity,  and  that  there  appears  to  be  a  difference 
between  the  three  ship  types,  some  of  the  finer  distinctions  may  be  due  to 
sampling  error.  Finally,  we  did  not  find  one  statistical  package  that  could 
easily  handle  all  aspects  of  this  analysis.  All  had  their  drawbacks. 
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APPENDIX  A.  SASCODE 


The  following  code  is  an  example  of  the  Job  Control  Language  (JCL)  and 
SAS  commands  that  can  be  used  to  fit  the  loglinear  model  whose  parameter 
estimates  are  given  in  Table  12  on  MVS  at  the  Naval  Postgraduate  School.  For 
a  detailed  explanation  of  using  SAS  on  MVS  see  Davis  (1990).  In  this 
particular  example,  the  data  is  entered  as  it  appears  in  Tables  18  and  19, 
Klingseis  (1979).  Entering  the  data  in  this  format  necessitates  the  rather 
intricate  DATA  statement.  PROC  CATMOD  is  used  to  fit  the  loglinear 
model.  Rather  than  use  the  POPULATION  statement  to  get  maximum 
likelihood  estimates  for  a  product  multinomial  likelihood,  the  term 
SHIP*CASTYPE*FLEET*BA*MONTH  is  included  and  a  single  multinomial 
likelihood  is  maximized.  Two  files  are  created,  loglin  listing  sent  to  the  users 
reader  which  contains  the  output  from  PROCCATMOD  and  a  SAS  file 
PRED.SAS  which  includes  the  SAS  data  set  PRED.RESID.  Among  other 
things,  this  data  set  contains  the  estimated  and  observed  cell  probabilities 
which  can  be  used  to  get  the  standardized  residuals.  Other  SAS  PROCs  are 
then  used  to  table  and  plot  these  residuals. 


FILE;  EXAMPLE  SAS  A 

//LOGLIN  JOB  (5096, 9999) , 'L  WHITAKER' , CLASS =J 
//  EXEC  SAS606,REGION=7000K 
//INI  DD  DSN=MSS.F4077.SAS12,USA,DISP=SHR 
//RSSID  DD  DISP= (NEW, CATLG, DELETE) ,UNIT=SYSDA. 

//  DCB=(RECFM=FB,LRECL=40,BLKSIZE=23440) , SPACE= (23400,  (1,1) )  , 

//SYSIN  DD  * 

TITLE  ' FINAL  MODEL ' ; 

DATA  RESID.OPPE; 

FORMAT  CHARV  $5.  FLEET  $1.  SHIP  $2.  BA  $1.  NCAS  $5 . ; 

INPUT  CHARV  $; 

FLEET=SUBSTR(CHARV, 1, 1) ; 
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SHI?=SUBSTR(CHARV, 2, 2) ; 

I?  LENGTH (CHARV) =4  THEN  CASTYPE  =  'N';ELSE  CASTYPE^'E'- 
BA=  SUBSTR ( CHARV . LENGTH { CHARV ) . 1 ) ; 

DO  NCAS=  'ZERO',  'ONE',  "TWO ' , ' 3 PLUS ' ; 

DO  MONTH=  1,  2,  3,  4,  5,  6,- 
INPUT  COUNT  @; 

IF  COUNT=0  THEN  COUNT=1E-20; 

OUTPUT; 

END; 

INPUT; 

END ; 

CARDS; 

PFFEX 

18  19  20  21  25  28 

7  10  8  14  6  7 

6  6  9  1  4  3 

9  5  3  4  5  2 

PFFX 

21  19  19  21  24  27 

14  15  13  11  6  9 

2  4  6  3  4  4 

3  2  2  5  6  0 

PDDEX 

3  7  11  10  13  13 

5  7  4  2  1  1 

3  4  3  2  2  1 

2  0  0  4  2  3 


LCGY 

2  3  2  3  2  4 

4  1  2  2  2  2 

12  112  0 
1  2  3  2  2  2 

r 

PRCC  CATMOD  DATA=RESID . OPPE  ORDER=DATA; 

WEIGHT  COUNT; 

RESPONSE  /  OUT=RESID . PRED { 

KEEP=NCAS  SHIP  FLEET  CASTYPE  MONTH  _PRES _ OBS _ RESID. 

_SEOBS _ SEPRED_)  ; 

MODEL  NCAS  *BA*SHIP*  MONTH  *  FLEET*  CASTYPE=_RES  PCNSE_ 

/NODESIGN  NOPROFILE  NORESPCNSE; 

LOGLIN  NCASISHIP92 

SHIP* FLEET  FLEET 
MONTH*NCAS  MONTH 
SHIP*BA*MONTH*FLEET*CASTYPE; 

RUN; 

/* 

// 


.TYP 
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APPENDIX  B.  MAXIMUM  LIKELIHOOD  ESTIMATION  FOR 
CENSORED  GEOMETRIC  AND  POISSON  DISTRIBUTIONS 

The  frequenqr  counts  f^,  represent  the  number  of  ships  reporting  0, 1, 
2,  3  or  more  casualties,  respectively.  These  are  right  censored  data  and  the 
censoring  influences  the  maximum  likelihood  estimation  method.  Indeed  the 
estimators  developed  below  (or  their  equivalents)  are  necessary  to  support 
the  cAi-square  test  statistics  used  in  goodness-of-fit  testing.  Both  the 
Geometric  and  Poisson  distributions  are  candidates  to  model  these 
frequencies.  It  is  natural  to  default  to  the  familiar  distributions.  What  follows 
is  an  analysis  of  goodness-of-fit  testing  when  these  two  distributions  are  fitted 
to  the  frequency  counts,  pooled  over  the  six  months,  and  treated  as  24 
separate  experiments  as  in  Section  2. 

Geometric.  Consider  the  censored  geometric  probability  function 


=  for  ;  =  0,l,...,c-l, 

C  I®-'*-) 

Pc  =  V 

C 

and  p  +  =  1.  The  data  cortsists  of  counts  ...,  and  let  N  = 

0 

We  proceed  to  develop  the  likelihood  fimction,  its  logarithm,  and  the 
maximum  likelihood  equations. 


c 

Up)  =  n 

;=0 
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c-1  c-1 

(P  =  ln(L)  =  j;/;.ln(?)+ =  (N-/,)ln(?)  +  Sln(p) 


where  ^  -  Y^fj-  Tl^en 
;=0 


S 


which  is  set  to  zero.  The  solution  for  p  is  the  maximum-likelihood  estimator 


P^S/{S^N-f^) 


Poisson.  The  censored  Poisson  probability  function  is 


Pj  = 

C-1 

M 


for;  =  0,  ...,c-l 


(B.3) 


Again are  the  counting  data  and  N  =  Vr  The  general  structure  of 

0 

the  likelihood  system  is 


LW  =  1  bf 


<p  =  ln{L)  = 

0 


33 


The  components  of  this  system  are  best  treated  with  the  following  techiuque. 


^0  -A 

^  =  -e  =  -po  =  P.i-Po 


dX 


iL 


Ph-Pj 


for  j 


?Pl 

dX 


c-1 


£  (P^-P^i)  =  Pc-i- 
0 


Then 


PjdX  p. 


for  /  =  0, 


Pc  Pc 


=  0  by  convention. 


=  1, 


These  quantities  are  then  placed  into  spaces  giving  the  structure 


c-1 


0 


(B.4) 


It  is  required  to  solve  <Px-^  equation  is  nonlinear  and  explicit 

solution  is  not  possible.  Newton-Raphson  iteration  works  quite  well 
however.  To  execute  it  let  g{X)  =  and  evaluate  the  derivative 

c-1 
0 

,  ^Pc-1  , 

because  =  p,.2-Pc.v  Pc-r 

The  Newton-Raphson  iteration  formvda  is 

A  -  A-g(A)/^'(A)  (B.5) 

and  it  can  be  initiated  with  K  ~  ^hen  |  g{\)  \  <  e  for  some  user 

defined  e>  0.  ^ 

Tables  14  through  17  show  the  results  of  fitting  the  geometric  distribution 
(B.l)  to  the  24  cases.  The  estimates  for  p  and  -ln(p)  are  both  tabled;  p  is  the 
probability  of  zero  CASREPTs  and  -ln(p)  can  be  compared  to  the  A  estimates 
for  the  Poisson  model.  (These  are  not  to  be  confused  with  the  significance 
values.)  Both  Pearson  and  likelihood  ratio  cftz-square  test  statistics  are  listed 
in  Table  15.  Generally  the  tests  fail,  but  for  different  reasons.  This  accounts 


Pc-2-Pc-rP  c-i^Pc 
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for  the  large  differertces  in  values.  Table  16  shows  that  all  but  a  handful  of  the 
tests  fail. 

Table  17  through  19  show  similar  results  for  the  Poisson  model  (B.3).  The 
format  is  the  same  and  generally  so  are  the  results.  The  two  models  do  not 
agree  with  the  data,  or  with  each  other. 


Fitting  the  Geometric  Distribution 


TABLE  14.  MAXIMUM  LIKELIHOOD  ESTIMATES  FOR  pl~\n(p) 


ATLANTIC  PACIFIC 

Engineering  Nonengineering  Engineering  Nonengineering 


FF 

Before 

.416/ 

.88 

.494/ 

.71 

.479/ 

.74 

.436/ 

.83 

After 

.380/ 

.97 

.486/ 

.72 

.471/ 

.75 

.495/ 

.70 

DD 

Before 

.451/ 

.80 

.574/ 

.55 

.472/ 

.75 

.487/ 

.72 

After 

.317/ 

1.15 

.492/ 

.71 

.500/ 

.69 

.567/ 

.57 

CG 

Before 

.364/ 

1.01 

.533/ 

.63 

.539/ 

.62 

.394/ 

.93 

After 

.303/ 

1.19 

.606/ 

.50 

.325/ 

1.12 

.509/ 

.67 

TABLE  15.  PEARSON/LIKELIHOOD  RATIO  CHI-SQUARE  (2) 
GOODNESS-OF-FIT  VALUES 


ATLANTIC  PACIFIC 


Engineering  Nonengineering  Engineering  Nonengineering 


FF 

Before 

4.5/ 

19.7 

9.5/ 

40.5 

15.1/ 

38.0 

5.9/ 

22.6 

After 

1.8/ 

12.7 

9.0/ 

37.0 

6.1/ 

28.8 

4.8/ 

33.2 

DD 

Before 

31.2/ 

37.3 

18.0/ 

51.9 

6.8/ 

16.8 

20.9/ 

27.7 

After 

10.0/ 

11.6 

6.7/ 

25.9 

2.5/ 

15.7 

21.2/ 

38.4 

CG 

Before 

9.4/ 

13.0 

14.0/ 

17.4 

8.4/ 

16.5 

4.9/ 

7.1 

After 

1.3/ 

2.0 

15.2/ 

23.2 

3.3/ 

4.8 

3.2/ 

11.0 

TABLE  16.  SIGNIFICANCE  (1-p  VALUES) 

ATLANTIC  PACIFIC 

Engineering  Nonengineering  Engineering  Nonengineering 


FF 

Before 

.896/ 

1.00 

.991/  1.00 

1.000/ 

1.00 

.947/ 

1.00 

After 

.596/ 

1.00 

.989/  1.00 

.952/ 

1.00 

.911/ 

1.00 

DD 

Before 

1.000/ 

1.00 

1.000/  1.00 

.967/ 

1.00 

1.000/ 

1.00 

After 

.993/ 

1.00 

.965/  1.00 

.709/ 

1.00 

1.000/ 

1.00 

CG 

Before 

.991/ 

1.00 

.999/  1.00 

.985/ 

1.00 

.915/ 

.97 

After 

.465/ 

.64 

.999/  1.00 

.807/ 

.91 

.801/ 

1.00 
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Fitting  the  Poisson  Distribution 


TABLE  17.  MAXIMUM  LIKELIHOOD  ESTIMATES  FOR  e'^tX 


ATLANTIC  PACIFIC 

Engineering  Nonengineering  Engineering  Nonengineering 


FF 

Before 

.738/ 

.478 

.937/ 

.392 

.846/ 

.429 

.767/ 

.464 

After 

.665/ 

.514 

.925/ 

.396 

.875/ 

.417 

1.026/ 

.359 

DD 

Before 

.621/ 

.537 

1.194/ 

.303 

.882/ 

.414 

.748/ 

.473 

After 

.458/ 

.632 

.929/ 

.395 

1.018/ 

.361 

1.093/ 

.335 

CG 

Before 

.669/ 

.512 

.818/ 

.441 

.964/ 

.381 

.542/ 

.582 

After 

.475/ 

.622 

1.173/ 

.309 

.609/ 

.544 

1.005/ 

.366 

TABLE  18.  PEARSON/LIK  RATIO  CHI-SQUARE  (2)  GOODNESS-OF-FIT 

VALUES 

ATLANTIC  PACIFIC 

Engineering  Nonengineering  Engineering  Nonengineering 
FF  Before  26.5/  26.5  18.3/  18.0  40.7/  38.3  15.7/  15.1 

After  19.4/  19.6  29.0/  29.2  19.2/  19.0  25.6/  25.2 

DD  Before  99.9/  56.5  15.8/  15.7  19.8/  20.6  49.2/  35.6 

After  36.5/  26.4  9.3/  9.2  10.4/  10.4  26.7/  26.5 

CG  Before  18.0  /  22.6  29.1/  19.3  9.2/  7.5  15.9/  9.1 

After  1.1/  1.1  9.9/  8.2  11.3/  13.0  2.1/  2.1 

TABLE  19.  SIGNIFICANCE  (1-p  VALUES) 

ATLAiNTIC  PACIFIC 

Engineering  Nonengineering  Engineering  Nonengineering 
FF  Before  1.00/  1.00  1.00/  1.00  1.00/  1.00  1.00/  1.00 

After  1.00/  1.00  1.00/  1.00  1.00/  1.00  1.00/  1.00 

DD  Before  1.00/  1.00  1.00/  1.00  1.00/  1.00  1.00/  1.00 

After  1.00/  1.00  .99/  .99  .99/  .99  1.00/  1.00 

CG  Before  1.00/  1.00  1.00/  1.00  .99/  .98  1.00/  .99 

After  .41/  .43  .99/  .98  1.00/  1.00  .64/  .65 
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